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Abstract 

We study the probability distribution of stock returns at mesoscopic time lags (re- 
turn horizons) ranging from about an hour to about a month. While at shorter 
microscopic time lags the distribution has power-law tails, for mesoscopic times the 
bulk of the distribution (more than 99% of the probability) follows an exponential 
law. The slope of the exponential function is determined by the variance of returns, 
which increases proportionally to the time lag. At longer times, the exponential 
law continuously evolves into Gaussian distribution. The exponential-to-Gaussian 
crossover is well described by the analytical solution of the Heston model with 
stochastic volatility. 
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1 Introduction 

The empirical probability distribution functions (EDFs) for different assets 
have been extensively studied by the econophysics community in recent years 
[1,2,3,4,5,6,7,8,9,10]. Stock and stock-index returns have received special at- 
tention. We focus here on the EDFs of the returns of individual large American 
companies from 1993 to 1999, a period without major market disturbances. 
By 'return' we always mean 'log-return', the difference of the logarithms of 
prices at two times separated by a time lag t. 

The time lag t is an important parameter: the EDFs evolve with this pa- 
rameter. At micro lags (typically shorter than one hour), effects such as the 
discreteness of prices and transaction times, correlations between successive 
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transactions, and fluctuations in trading rates become important. Power-law 
tails of EDFs in this regime have been much discussed in the literature before 
[2,3]. At 'meso' time lags (typically from an hour to a month), continuum 
approximations can be made, and some sort of diffusion process is plausible, 
eventually leading to a normal Gaussian distribution. On the other hand, at 
'macro' time lags, the changes in the mean market drifts and macroeconomic 
'convection' effects can become important, so simple results are less likely to be 
obtained. The boundaries between these domains to an extent depend on the 
stock, the market where it is traded, and the epoch. The micro-meso boundary 
can be defined as the time lag above which power-law tails constitute a very 
small part of the EDF. The meso- macro boundary is more tentative, since 
statistical data at long time lags become sparse. 

The first result is that we extend to meso time lags a stylized fact known 
since the 19th century [11] (quoted in [12]): with a careful definition of time 
lag t, the variance of returns is proportional to t. 

The second result is that log-linear plots of the EDFs show prominent 
straight-line (tent-shape) character, i.e. the bulk (about 99%) of the prob- 
ability distribution of log-return follows an exponential law. The exponential 
law applies to the central part of EDFs, i.e. not too big log-returns. For the far 
tails of EDFs, usually associated with power laws at micro time lags, we do 
not have enough statistically reliable data points at meso lags to make a defi- 
nite conclusion. Exponential distributions have been reported for some world 
markets [4,5,6,7,8,9,10] and briefly mentioned in the book [1] (sec Fig. 2.12). 
However, the exponential law has not yet achieved the status of a stylized fact. 
Perhaps this is because influential work [2,3] has been interpreted as finding 
that the individual returns of all the major US stocks for micro to macro time 
lags have the same power law EDFs, if they are rescaled by the volatility. 

The Heston model is a plausible diffusion model with stochastic volatility, 
which reproduces the timelag-variance proportionality and the crossover from 
exponential distribution to Gaussian. This model was first introduced by He- 
ston, who studied option prices [13]. Later Dragulescu and Yakovenko (DY) 
derived a convenient closed-form expression for the probability distribution of 
returns in this model and applied it to stock indexes from 1 day to 1 year [4]. 
The third result is that the DY formula with three lag-independent parameters 
reasonably fits the time evolution of EDFs at meso lags. 

2 Probability distribution of log-returns in the Heston model 

In this section, the Heston model [13] and the DY formula [4] are briefly 
summarized. The price St of a model stock obeys the stochastic differential 
equation of multiplicative Brownian motion: dSt = jJ-St dt + y/v^St dwl;^\ Here 

the subscript t indicates time dependence, fi is the drift parameter, wj;^^ is 
a standard random Wiener process, and Vt is the fluctuating time-dependent 
variance. The detrended log-return is defined as Xt — lii{St/So) — l^t, although 
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detrending is a minor correction at meso lags. In the Hcston model, the vari- 
ance Vt obeys the mean-reverting stochastic differential equation: 

dvt = -^{vt -9)dt + K^tdWl^\ (1) 

Here 6 is the long-time mean of v, 7 is the rate of relaxation to this mean, and k 

(1 2,) 

is the variance noise. We take the Wiener processes to be uncorrelated. 

The DY formula [4] for the probability density function (PDF) Pt{x) is: 



P*(^)=7°^e^'=^+^#), F,(fc) = f -aln 



cosh — -I ^— sinh — (2,) 

2 2fl 2y 

i =-ft, a = 2^9/ K^, Q = -^1 + {kn/^y, = {x]) = 9t. (3) 

The variance = {xf) (3) of the PDF (2) increases linearly in time, while 
{xt) = 0. The three parameters of the model are 7, 9 and a. At short and long 
time lags, the PDF (2) reduces to exponential (if a = 1) and Gaussian [4]: 

exp(-|x|y27^), f=7^«l, ... 
Pt{x) (X { ' ^ (4) 

exp(-a;V2^t), t = 7^ > 1. 



In both limits, it scales with the volatility: Ptix) — f{x/ \J{xD) — f{x/V9i), 
where / is the exponential or the Gaussian function. 



3 Data analysis and discussion 

We analyzed the data from Jan/ 1993 to Jan/2000 for 27 Dow companies, 
but show results only for four large cap companies: Intel (INTC) and Mi- 
crosoft (MSFT) traded at NASDAQ, and IBM and Merck (MRK) traded at 
NYSE. We use two databases, TAQ to construct the intraday returns and 
Yahoo database for the interday returns. The intraday time lags were chosen 
at multiples of 5 minutes, which divide exactly the 6.5 hours (390 minutes) of 
the trading day. The interday returns are as described in [4,5] for time lags 
from 1 day to 1 month = 20 trading days. 

In order to connect the interday and intraday data, we have to introduce an 
effective overnight time lag T„. Without this correction, the open-to-close and 
close-to-close variances would have a discontinuous jump at 1 day, as shown in 
the inset of the left panel of Fig. 1. By taking the open-to-close time to be 6.5 
hours, and the close-to-close time to be 6.5 hours -|- r„, we find that variance 
(xj) is proportional to time t, as shown in the left panel of Fig. 1. The slope 
gives us the Heston parameter 9 in Eq. (3). T„ is about 2 hours (see Table 1). 

In the right panel of Fig. 1, we show the log-linear plots of the cumulative 
distribution functions (CDFs) vs. normalized return x/\/9i. The CDFt(x) is 
defined as J^^Pt{x') dx', and we show CDFt(x) for x < and 1 - CDFt{x) 
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Fig. 1. Left panel: Variance (x^) vs. time lag t. Solid lines: Linear fits (x^) = Qt. Inset: 
Variances for MRK before adjustment for the effective overnight time Tn- Right panel: 
Log-linear plots of CDFs vs. x/^/Oi. Straight dashed lines —\x\^j2/9t are predicted by 
the DY formula (4) in the short-time limit. The curves are offset by a factor of 10. 

for X > 0. We observe that CDFs for different time lags t collapse on a single 
straight line without any further fitting (the parameter 9 is taken from the 
fit in the left panel). More than 99% of the probability in the central part 
of the tent-shape distribution function is well described by the exponential 
function. Moreover, the collapsed CDF curves agree with the DY formula (4) 
oc exp(- 



Pt{x) oc exp(— |x| yS/^t) in the short-time limit for a = 1 [4], which is shown 
by the dashed lines. 

Because the parameter 7 drops out of the asymptotic Eq. (4), it can be 
determined only from the crossover regime between short and long times, 
which is illustrated in the left panel of Fig. 2. We determine 7 by fitting the 
characteristic function Pt{k), a Fourier transform of Pt{x) with respect to x. 
The theoretical characteristic function of the Heston model is Pt{k) = e^'^'^-' 
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Fig. 2. Left panel: Theoretical CDFs for the Heston model plotted vs. xjyQt. The 
curves interpolate between the short-time exponential and long-time Gaussian scalings. 
Right panel: Comparison between empirical (points) and the DY theoretical (curves) 
characteristic functions Pt{k). 
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Fig. 3. Comparison between the 1993-1999 Intel data (points) and the DY formula (2) 
(curves) for PDF (left panel) and CDF (right panel). 

(2). The empirical characteristic functions (ECFs) can be constructed from 
the data series by taking the sum Pt{k) = ReJ2xt^^Pi~'''^^t) over all returns 
Xt for a given t [14]. Fits of ECFs to the DY formula (2) are shown in the right 
panel of Fig. 2. The parameters determined from the fits are given in Table 1. 

In the left panel of Fig. 3 we compare the empirical PDF Pt{x) with the 
DY formula (2). The agreement is quite good, except for the very short time 
lag of 5 minutes, where the tails are visibly fatter than exponential. In order 
to make a more detailed comparison, we show the empirical CDFs (points) 
with the theoretical DY formula (lines) in the right panel of Fig. 3. We see 
that, for micro time lags of the order of 5 minutes, the power-law tails are 
significant. However, for meso time lags, the CDFs fall onto straight lines in 
the log-linear plot, indicating exponential law. For even longer time lags, they 
evolve into the Gaussian distribution in agreement with the DY formula (2) 
for the Heston model. To illustrate the point further, we compare empirical 
and theoretical data for several other companies in Fig. 4. 

In the empirical CDF plots, we actually show the ranking plots of log-returns 
Xt for a given t. So, each point in the plot represents a single instance of price 
change. Thus, the last one or two dozens of the points at the far tail of each 
plot constitute a statistically small group and show large amount of noise. 

Table 1 

Fitting parameters of the Heston model with q = 1 for the 1993-1999 data. 
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Statistically reliable conclusions can be made only about the central part of 
the distribution, where the points are dense, but not about the far tails. 



4 Conclusions 



We have shown that in the mesoscopic range of time lags, the probability 
distribution of financial returns interpolates between exponential and Gaus- 
sian law. The time range where the distribution is exponential depends on a 
particular company, but it is typically between an hour and few days. Similar 
exponential distributions have been reported for the Indian [6], Japanese [7], 
German [8], and Brazilian markets [9,10], as well as for the US market [4,5] 
(see also Fig. 2.12 in [1]). The DY formula [4] for the Heston model [13] cap- 
tures the main features of the probability distribution of returns from an hour 
to a month with a single set of parameters. We believe that econophysicists 
should be aware of the presence of the exponential distribution and recognize 
it as another "stylized fact" in the set of analytical tools for financial data 
analysis. 

We thank Chuck Lahaie from the Robert H. Smith School of Business at 
UMD for help with the TAQ database. 
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Fig. 4. Comparison between empirical data (symbols) and the DY formula (2) (lines) 
for CDF (left panels) and characteristic function (right panels). 
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